Skip to content

feat(config): resolve relative embedding.model against config dir / source_root#323

Merged
HumanBean17 merged 1 commit into
masterfrom
feat/embedding-model-relative-path
Jun 14, 2026
Merged

feat(config): resolve relative embedding.model against config dir / source_root#323
HumanBean17 merged 1 commit into
masterfrom
feat/embedding-model-relative-path

Conversation

@HumanBean17

Copy link
Copy Markdown
Owner

Summary

Makes embedding.model resolve relative paths and environment-variable paths deterministically so a .java-codebase-rag.yml can be committed to a real project and work portably across machines and across the CLI indexer vs the MCP reader.

A relative embedding.model (e.g. ./models/minilm) was previously handed to sentence-transformers verbatim and resolved against process CWD — unlike index_dir and source_root, which already anchor on the config file's directory. That made a committed config non-portable, and could let the CLI indexer and the MCP reader load different models from the same config when their CWDs differed (silent vector-dimension mismatch).

Changes

  • java_codebase_rag/config.pymaybe_expand_embedding_model_path gains optional config_dir / source_root / source kwargs. After the existing ~ / $VAR expansion, a result still ./ / ../-prefixed is resolved to absolute, mirroring _resolve_index_dir_path:
    • YAML model → resolves against the config file's directory
    • SBERT_MODEL / --embedding-model → resolves against the resolved source_root
    • Hub ids (org/name), absolute paths, ~/-expanded values, and an env var that already yielded an absolute path are all left untouched.
    • With no base supplied (the MCP runtime read via resolved_sbert_model_for_process_env), relative resolution is skipped → that path is byte-for-byte unchanged. The fix: MCP server loads embedding config from .java-codebase-rag.yml (#238) #239 MCP YAML-loading fix is untouched.
  • tests/test_config.py — 12 new tests (4 integration through resolve_operator_config, 8 unit on the helper).
  • docs/CONFIGURATION.md — updated the embedding.model YAML comment block and the Path-expansion table row.

User-visible behavior changes

  • A relative embedding.model now loads from a deterministic absolute path (config dir for YAML, source_root for CLI/env) instead of an unreliable CWD.
  • $VAR / ${VAR} interpolation (already supported) is preserved.
  • No change for hub ids, absolute paths, ~/....

Reindex / env / ontology

  • No ontology bump, no env-var change.
  • Reindex caveat: if you previously committed a relative embedding.model that only worked because CWD lined up, the resolved path now anchors on the config dir. If those differ, run a one-time java-codebase-rag reprocess --vectors-only. Users with absolute / hub-id / ~ / $VAR model values need no reindex.

Validation

No propose/plan doc — bounded, single-file-behavior change.

🤖 Generated with Claude Code

…ource_root

A relative `embedding.model` (e.g. `./models/minilm`) was handed to
sentence-transformers verbatim and resolved against process CWD — unlike
`index_dir` and `source_root`, which anchor on the config file's
directory. That made a committed `.java-codebase-rag.yml` non-portable,
and could let the CLI indexer and the MCP reader load different models
from the same config when their CWDs differed.

`maybe_expand_embedding_model_path` now takes optional `config_dir` /
`source_root` / `source` kwargs and, after the existing `~` / `$VAR`
expansion, resolves a result still `./` / `../`-prefixed to absolute —
mirroring `_resolve_index_dir_path`: YAML values anchor on the config
file's directory, CLI / env values on `source_root`. Hub ids, absolute
paths, `~/`-expanded values, and an env var that already yielded an
absolute path are all left untouched.

With no base supplied, relative resolution is skipped, so the MCP
runtime read (`resolved_sbert_model_for_process_env`) is byte-for-byte
unchanged — it receives an already-absolute path from
`apply_to_os_environ` in the normal flow. The #239 MCP YAML-loading fix
is untouched.

Co-Authored-By: Claude <noreply@anthropic.com>
@HumanBean17 HumanBean17 merged commit a938083 into master Jun 14, 2026
1 check passed
HumanBean17 added a commit that referenced this pull request Jun 15, 2026
Catch-up: master advanced (#322 installer cross_service_resolution,
#323 config embedding.model resolution, #325 version 0.6.2, #326 PR-1
progress.py) while the index-output-rework stack was based on #320.
This merges those in so the catch-up PR (#330) carries only PR-2/3/4.

Conflicts resolved (both add/add, feature branch is the superset):
- java_codebase_rag/progress.py  (master had PR-1 state; branch has
  PR-1 + CallbackRenderer/make_relay/build_index_progress_context)
- tests/test_progress.py         (master had PR-1's 14 tests; branch
  adds PR-2/3/4 tests)

Auto-merged cleanly: installer.py (#322 + PR-4), pyproject.toml
(version 0.6.2 + rich>=14,<15), tests/test_installer.py.

Verified: ruff clean; full suite 833 passed, 13 skipped (heavy-gated).

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant